Overview

Dataset statistics

Number of variables29
Number of observations78032
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.3 MiB
Average record size in memory232.0 B

Variable types

Numeric11
Categorical9
Boolean9

Alerts

Name has a high cardinality: 22710 distinct valuesHigh cardinality
Address has a high cardinality: 6618 distinct valuesHigh cardinality
StreetName has a high cardinality: 669 distinct valuesHigh cardinality
Location has a high cardinality: 56 distinct valuesHigh cardinality
NAICSDescr has a high cardinality: 1039 distinct valuesHigh cardinality
RecordID is highly overall correlated with FID and 3 other fieldsHigh correlation
X is highly overall correlated with PostalCode and 4 other fieldsHigh correlation
Y is highly overall correlated with CENT_YHigh correlation
Ward is highly overall correlated with X and 5 other fieldsHigh correlation
NAICSCode is highly overall correlated with Location and 1 other fieldsHigh correlation
CENT_X is highly overall correlated with X and 4 other fieldsHigh correlation
CENT_Y is highly overall correlated with X and 5 other fieldsHigh correlation
PostalCode is highly overall correlated with X and 5 other fieldsHigh correlation
Location is highly overall correlated with RecordID and 9 other fieldsHigh correlation
NAICSCat is highly overall correlated with Location and 1 other fieldsHigh correlation
Year is highly overall correlated with RecordID and 1 other fieldsHigh correlation
Age is highly overall correlated with RecordID and 1 other fieldsHigh correlation
FID is highly overall correlated with RecordID and 5 other fieldsHigh correlation
BusinessID is highly overall correlated with FID and 1 other fieldsHigh correlation
Y is highly skewed (γ1 = -120.8370508)Skewed
StreetNo is highly skewed (γ1 = 147.6524357)Skewed
RecordID is uniformly distributedUniform
RecordID has unique valuesUnique

Reproduction

Analysis started2023-04-01 18:21:00.957265
Analysis finished2023-04-01 18:22:05.293776
Duration1 minute and 4.34 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

RecordID
Real number (ℝ)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct78032
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39016.5
Minimum1
Maximum78032
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:05.477216image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3902.55
Q119508.75
median39016.5
Q358524.25
95-th percentile74130.45
Maximum78032
Range78031
Interquartile range (IQR)39015.5

Descriptive statistics

Standard deviation22526.042
Coefficient of variation (CV)0.57734657
Kurtosis-1.2
Mean39016.5
Median Absolute Deviation (MAD)19508
Skewness0
Sum3.0445355 × 109
Variance5.0742259 × 108
MonotonicityStrictly increasing
2023-04-01T18:22:05.774292image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
52020 1
 
< 0.1%
52027 1
 
< 0.1%
52026 1
 
< 0.1%
52025 1
 
< 0.1%
52024 1
 
< 0.1%
52023 1
 
< 0.1%
52022 1
 
< 0.1%
52021 1
 
< 0.1%
52019 1
 
< 0.1%
Other values (78022) 78022
> 99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
78032 1
< 0.1%
78031 1
< 0.1%
78030 1
< 0.1%
78029 1
< 0.1%
78028 1
< 0.1%
78027 1
< 0.1%
78026 1
< 0.1%
78025 1
< 0.1%
78024 1
< 0.1%
78023 1
< 0.1%

X
Real number (ℝ)

Distinct4284
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-79.654547
Minimum-79.80298
Maximum-79.550935
Zeros0
Zeros (%)0.0%
Negative78032
Negative (%)100.0%
Memory size609.8 KiB
2023-04-01T18:22:06.274384image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-79.80298
5-th percentile-79.743189
Q1-79.68296
median-79.651649
Q3-79.621073
95-th percentile-79.578693
Maximum-79.550935
Range0.25204547
Interquartile range (IQR)0.061886626

Descriptive statistics

Standard deviation0.047541739
Coefficient of variation (CV)-0.00059684903
Kurtosis-0.087208553
Mean-79.654547
Median Absolute Deviation (MAD)0.031068498
Skewness-0.39700919
Sum-6215603.6
Variance0.002260217
MonotonicityNot monotonic
2023-04-01T18:22:06.577147image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-79.6975994 1968
 
2.5%
-79.64275968 831
 
1.1%
-79.60364656 652
 
0.8%
-79.71222857 508
 
0.7%
-79.61962672 436
 
0.6%
-79.63864759 423
 
0.5%
-79.56936408 390
 
0.5%
-79.6136892 274
 
0.4%
-79.75938361 252
 
0.3%
-79.60455904 248
 
0.3%
Other values (4274) 72050
92.3%
ValueCountFrequency (%)
-79.80298035 6
 
< 0.1%
-79.8014612 5
 
< 0.1%
-79.79447393 6
 
< 0.1%
-79.79439767 3
 
< 0.1%
-79.78884298 6
 
< 0.1%
-79.78871792 137
0.2%
-79.78850259 5
 
< 0.1%
-79.78675536 55
0.1%
-79.78630211 72
0.1%
-79.78452433 64
0.1%
ValueCountFrequency (%)
-79.55093488 15
< 0.1%
-79.55280776 8
< 0.1%
-79.55341309 4
 
< 0.1%
-79.55391093 6
 
< 0.1%
-79.55445215 7
< 0.1%
-79.55472553 9
< 0.1%
-79.55507028 6
 
< 0.1%
-79.55523334 5
 
< 0.1%
-79.55532738 5
 
< 0.1%
-79.55542565 4
 
< 0.1%

Y
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct4284
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.607593
Minimum0
Maximum43.732864
Zeros5
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:06.860045image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile43.517559
Q143.576985
median43.608267
Q343.649308
95-th percentile43.698595
Maximum43.732864
Range43.732864
Interquartile range (IQR)0.072322677

Descriptive statistics

Standard deviation0.35296606
Coefficient of variation (CV)0.0080941423
Kurtosis14926.763
Mean43.607593
Median Absolute Deviation (MAD)0.036033607
Skewness-120.83705
Sum3402787.7
Variance0.12458504
MonotonicityNot monotonic
2023-04-01T18:22:07.145446image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
43.51755854 1968
 
2.5%
43.59351505 831
 
1.1%
43.67999884 652
 
0.8%
43.55837136 508
 
0.7%
43.57693412 436
 
0.6%
43.72011759 423
 
0.5%
43.5935916 390
 
0.5%
43.6325595 274
 
0.4%
43.58207115 252
 
0.3%
43.62508971 248
 
0.3%
Other values (4274) 72050
92.3%
ValueCountFrequency (%)
0 5
< 0.1%
43.48517014 10
< 0.1%
43.48968489 5
< 0.1%
43.4915708 5
< 0.1%
43.49199992 10
< 0.1%
43.49224252 3
 
< 0.1%
43.49454092 6
< 0.1%
43.49517064 4
 
< 0.1%
43.49608236 9
< 0.1%
43.49636475 5
< 0.1%
ValueCountFrequency (%)
43.73286372 38
< 0.1%
43.73233211 5
 
< 0.1%
43.73196635 6
 
< 0.1%
43.73068152 8
 
< 0.1%
43.72935757 6
 
< 0.1%
43.72770692 6
 
< 0.1%
43.72552272 11
 
< 0.1%
43.72537511 8
 
< 0.1%
43.7250583 5
 
< 0.1%
43.7248112 10
 
< 0.1%

FID
Real number (ℝ)

Distinct16518
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7823.2043
Minimum1
Maximum16518
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:07.440182image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile781
Q13902
median7804
Q311705.25
95-th percentile14902
Maximum16518
Range16517
Interquartile range (IQR)7803.25

Descriptive statistics

Standard deviation4538.5029
Coefficient of variation (CV)0.58013351
Kurtosis-1.1665353
Mean7823.2043
Median Absolute Deviation (MAD)3902
Skewness0.024756244
Sum6.1046028 × 108
Variance20598009
MonotonicityNot monotonic
2023-04-01T18:22:07.722873image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 5
 
< 0.1%
9727 5
 
< 0.1%
9729 5
 
< 0.1%
9730 5
 
< 0.1%
9731 5
 
< 0.1%
9732 5
 
< 0.1%
9733 5
 
< 0.1%
9734 5
 
< 0.1%
9735 5
 
< 0.1%
9736 5
 
< 0.1%
Other values (16508) 77982
99.9%
ValueCountFrequency (%)
1 5
< 0.1%
2 5
< 0.1%
3 5
< 0.1%
4 5
< 0.1%
5 5
< 0.1%
6 5
< 0.1%
7 5
< 0.1%
8 5
< 0.1%
9 5
< 0.1%
10 5
< 0.1%
ValueCountFrequency (%)
16518 1
< 0.1%
16517 1
< 0.1%
16516 1
< 0.1%
16515 1
< 0.1%
16514 1
< 0.1%
16513 1
< 0.1%
16512 1
< 0.1%
16511 1
< 0.1%
16510 1
< 0.1%
16509 1
< 0.1%

BusinessID
Real number (ℝ)

Distinct21240
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34656.267
Minimum2
Maximum94424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:08.017721image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2230
Q19764
median19182.5
Q355026
95-th percentile88915
Maximum94424
Range94422
Interquartile range (IQR)45262

Descriptive statistics

Standard deviation29857.312
Coefficient of variation (CV)0.86152708
Kurtosis-0.99364033
Mean34656.267
Median Absolute Deviation (MAD)16019.5
Skewness0.65057392
Sum2.7042978 × 109
Variance8.9145909 × 108
MonotonicityNot monotonic
2023-04-01T18:22:08.310308image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1055 5
 
< 0.1%
20882 5
 
< 0.1%
19580 5
 
< 0.1%
20871 5
 
< 0.1%
19831 5
 
< 0.1%
19332 5
 
< 0.1%
19583 5
 
< 0.1%
19832 5
 
< 0.1%
19584 5
 
< 0.1%
20872 5
 
< 0.1%
Other values (21230) 77982
99.9%
ValueCountFrequency (%)
2 2
 
< 0.1%
7 5
< 0.1%
10 5
< 0.1%
12 3
< 0.1%
16 5
< 0.1%
18 5
< 0.1%
20 5
< 0.1%
21 5
< 0.1%
23 5
< 0.1%
26 4
< 0.1%
ValueCountFrequency (%)
94424 1
< 0.1%
94423 1
< 0.1%
94419 1
< 0.1%
94371 1
< 0.1%
94321 1
< 0.1%
94319 1
< 0.1%
94318 1
< 0.1%
94317 1
< 0.1%
94313 1
< 0.1%
94293 1
< 0.1%

Name
Categorical

Distinct22710
Distinct (%)29.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Subway
 
212
Tim Hortons
 
181
Petro Canada
 
123
Shoppers Drug Mart
 
102
Tim Horton's
 
97
Other values (22705)
77317 

Length

Max length118
Median length76
Mean length22.654539
Min length1

Characters and Unicode

Total characters1767779
Distinct characters93
Distinct categories15 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5010 ?
Unique (%)6.4%

Sample

1st rowGolf Trends Inc.
2nd rowApex Graphics Inc.
3rd rowSands, John & Associates Limited
4th rowPrintmedia-Tackaberry Times
5th rowS W R Industries Ltd.

Common Values

ValueCountFrequency (%)
Subway 212
 
0.3%
Tim Hortons 181
 
0.2%
Petro Canada 123
 
0.2%
Shoppers Drug Mart 102
 
0.1%
Tim Horton's 97
 
0.1%
PLASP Child Care Centre 96
 
0.1%
Dollarama 92
 
0.1%
Starbucks 88
 
0.1%
Shell Canada 84
 
0.1%
Royal Bank of Canada 78
 
0.1%
Other values (22700) 76879
98.5%

Length

2023-04-01T18:22:08.637214image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc 15794
 
5.7%
9127
 
3.3%
ltd 7946
 
2.9%
canada 4795
 
1.7%
centre 2969
 
1.1%
and 2598
 
0.9%
services 2443
 
0.9%
the 2359
 
0.8%
a 2092
 
0.8%
of 2044
 
0.7%
Other values (16113) 225478
81.2%

Most occurring characters

ValueCountFrequency (%)
199927
 
11.3%
e 132589
 
7.5%
a 128136
 
7.2%
n 115216
 
6.5%
i 104250
 
5.9%
r 101893
 
5.8%
o 97613
 
5.5%
t 94807
 
5.4%
s 77470
 
4.4%
l 62777
 
3.6%
Other values (83) 653101
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1236769
70.0%
Uppercase Letter 275469
 
15.6%
Space Separator 199927
 
11.3%
Other Punctuation 44368
 
2.5%
Decimal Number 4222
 
0.2%
Dash Punctuation 4194
 
0.2%
Close Punctuation 1272
 
0.1%
Open Punctuation 1266
 
0.1%
Math Symbol 178
 
< 0.1%
Final Punctuation 99
 
< 0.1%
Other values (5) 15
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 132589
10.7%
a 128136
10.4%
n 115216
9.3%
i 104250
 
8.4%
r 101893
 
8.2%
o 97613
 
7.9%
t 94807
 
7.7%
s 77470
 
6.3%
l 62777
 
5.1%
c 60202
 
4.9%
Other values (20) 261816
21.2%
Uppercase Letter
ValueCountFrequency (%)
C 35962
13.1%
S 28667
 
10.4%
I 23883
 
8.7%
M 18395
 
6.7%
L 18128
 
6.6%
A 17083
 
6.2%
P 16975
 
6.2%
T 15559
 
5.6%
D 13515
 
4.9%
B 11145
 
4.0%
Other values (17) 76157
27.6%
Other Punctuation
ValueCountFrequency (%)
. 29521
66.5%
& 7166
 
16.2%
, 3463
 
7.8%
' 3108
 
7.0%
/ 898
 
2.0%
: 88
 
0.2%
# 35
 
0.1%
@ 29
 
0.1%
! 26
 
0.1%
" 16
 
< 0.1%
Other values (2) 18
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 906
21.5%
2 760
18.0%
0 712
16.9%
4 418
9.9%
3 334
 
7.9%
9 287
 
6.8%
8 245
 
5.8%
7 197
 
4.7%
5 184
 
4.4%
6 179
 
4.2%
Math Symbol
ValueCountFrequency (%)
+ 152
85.4%
| 25
 
14.0%
> 1
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 1264
99.4%
] 8
 
0.6%
Space Separator
ValueCountFrequency (%)
199927
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4194
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1266
100.0%
Final Punctuation
ValueCountFrequency (%)
99
100.0%
Control
ValueCountFrequency (%)
6
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
3
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 2
100.0%
Other Symbol
ValueCountFrequency (%)
© 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1512238
85.5%
Common 255541
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 132589
 
8.8%
a 128136
 
8.5%
n 115216
 
7.6%
i 104250
 
6.9%
r 101893
 
6.7%
o 97613
 
6.5%
t 94807
 
6.3%
s 77470
 
5.1%
l 62777
 
4.2%
c 60202
 
4.0%
Other values (47) 537285
35.5%
Common
ValueCountFrequency (%)
199927
78.2%
. 29521
 
11.6%
& 7166
 
2.8%
- 4194
 
1.6%
, 3463
 
1.4%
' 3108
 
1.2%
( 1266
 
0.5%
) 1264
 
0.5%
1 906
 
0.4%
/ 898
 
0.4%
Other values (26) 3828
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1767601
> 99.9%
Punctuation 102
 
< 0.1%
None 76
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
199927
 
11.3%
e 132589
 
7.5%
a 128136
 
7.2%
n 115216
 
6.5%
i 104250
 
5.9%
r 101893
 
5.8%
o 97613
 
5.5%
t 94807
 
5.4%
s 77470
 
4.4%
l 62777
 
3.6%
Other values (75) 652923
36.9%
Punctuation
ValueCountFrequency (%)
99
97.1%
3
 
2.9%
None
ValueCountFrequency (%)
é 67
88.2%
ü 4
 
5.3%
ē 2
 
2.6%
É 1
 
1.3%
ä 1
 
1.3%
© 1
 
1.3%

Address
Categorical

Distinct6618
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
100 City Centre Dr
 
953
5100 Erin Mills Pky
 
523
7205 Goreway Dr
 
483
1250 South Service Rd
 
394
1550 South Gateway Rd
 
284
Other values (6613)
75395 

Length

Max length32
Median length27
Mean length16.625525
Min length5

Characters and Unicode

Total characters1297323
Distinct characters64
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique292 ?
Unique (%)0.4%

Sample

1st row300 Ambassador Dr
2nd row320 Ambassador Dr
3rd row320 Ambassador Dr
4th row320 Ambassador Dr
5th row321 Ambassador Dr

Common Values

ValueCountFrequency (%)
100 City Centre Dr 953
 
1.2%
5100 Erin Mills Pky 523
 
0.7%
7205 Goreway Dr 483
 
0.6%
1250 South Service Rd 394
 
0.5%
1550 South Gateway Rd 284
 
0.4%
4141 Dixie Rd 248
 
0.3%
2225 Erin Mills Pky 238
 
0.3%
50 Burnhamthorpe Rd W 229
 
0.3%
2355 Derry Rd E 212
 
0.3%
2000 Credit Valley Rd 212
 
0.3%
Other values (6608) 74256
95.2%

Length

2023-04-01T18:22:08.934610image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rd 28597
 
10.8%
dr 17907
 
6.8%
e 12047
 
4.6%
st 9954
 
3.8%
blvd 8013
 
3.0%
w 7245
 
2.7%
dundas 4805
 
1.8%
ave 3977
 
1.5%
matheson 2625
 
1.0%
pky 2579
 
1.0%
Other values (3761) 165836
62.9%

Most occurring characters

ValueCountFrequency (%)
185556
 
14.3%
r 77071
 
5.9%
e 71979
 
5.5%
a 58783
 
4.5%
d 55945
 
4.3%
0 51078
 
3.9%
n 49722
 
3.8%
5 48031
 
3.7%
t 47992
 
3.7%
i 45039
 
3.5%
Other values (54) 606127
46.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 636946
49.1%
Decimal Number 287140
22.1%
Uppercase Letter 187144
 
14.4%
Space Separator 185556
 
14.3%
Dash Punctuation 480
 
< 0.1%
Other Punctuation 54
 
< 0.1%
Modifier Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 77071
12.1%
e 71979
11.3%
a 58783
9.2%
d 55945
8.8%
n 49722
 
7.8%
t 47992
 
7.5%
i 45039
 
7.1%
o 36413
 
5.7%
l 32505
 
5.1%
s 27700
 
4.3%
Other values (15) 133797
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 31751
17.0%
D 29023
15.5%
S 18789
10.0%
E 16442
8.8%
B 14485
7.7%
C 13381
7.2%
W 11748
 
6.3%
M 9512
 
5.1%
A 9382
 
5.0%
T 6499
 
3.5%
Other values (14) 26132
14.0%
Decimal Number
ValueCountFrequency (%)
0 51078
17.8%
5 48031
16.7%
1 41652
14.5%
2 31311
10.9%
3 25187
8.8%
6 23265
8.1%
7 20531
7.2%
4 17381
 
6.1%
9 14549
 
5.1%
8 14155
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 46
85.2%
. 8
 
14.8%
Space Separator
ValueCountFrequency (%)
185556
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 480
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 824090
63.5%
Common 473233
36.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 77071
 
9.4%
e 71979
 
8.7%
a 58783
 
7.1%
d 55945
 
6.8%
n 49722
 
6.0%
t 47992
 
5.8%
i 45039
 
5.5%
o 36413
 
4.4%
l 32505
 
3.9%
R 31751
 
3.9%
Other values (39) 316890
38.5%
Common
ValueCountFrequency (%)
185556
39.2%
0 51078
 
10.8%
5 48031
 
10.1%
1 41652
 
8.8%
2 31311
 
6.6%
3 25187
 
5.3%
6 23265
 
4.9%
7 20531
 
4.3%
4 17381
 
3.7%
9 14549
 
3.1%
Other values (5) 14692
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1297323
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
185556
 
14.3%
r 77071
 
5.9%
e 71979
 
5.5%
a 58783
 
4.5%
d 55945
 
4.3%
0 51078
 
3.9%
n 49722
 
3.8%
5 48031
 
3.7%
t 47992
 
3.7%
i 45039
 
3.5%
Other values (54) 606127
46.7%

StreetNo
Real number (ℝ)

Distinct3090
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2946.1325
Minimum1
Maximum905629
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:09.212034image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile57
Q11050
median2375
Q35100
95-th percentile7070
Maximum905629
Range905628
Interquartile range (IQR)4050

Descriptive statistics

Standard deviation3997.6662
Coefficient of variation (CV)1.35692
Kurtosis33315.386
Mean2946.1325
Median Absolute Deviation (MAD)1655
Skewness147.65244
Sum2.2989261 × 108
Variance15981335
MonotonicityNot monotonic
2023-04-01T18:22:09.514214image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 1101
 
1.4%
5100 601
 
0.8%
7205 520
 
0.7%
1250 448
 
0.6%
1 442
 
0.6%
2000 383
 
0.5%
1550 359
 
0.5%
50 313
 
0.4%
4141 310
 
0.4%
2425 304
 
0.4%
Other values (3080) 73251
93.9%
ValueCountFrequency (%)
1 442
0.6%
2 198
0.3%
3 200
0.3%
4 154
 
0.2%
5 7
 
< 0.1%
6 33
 
< 0.1%
7 25
 
< 0.1%
8 21
 
< 0.1%
9 20
 
< 0.1%
10 154
 
0.2%
ValueCountFrequency (%)
905629 1
 
< 0.1%
7895 138
0.2%
7890 7
 
< 0.1%
7885 79
0.1%
7880 6
 
< 0.1%
7875 30
 
< 0.1%
7860 5
 
< 0.1%
7855 5
 
< 0.1%
7850 4
 
< 0.1%
7840 1
 
< 0.1%

StreetName
Categorical

Distinct669
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Dundas St E
 
3202
Matheson Blvd E
 
2125
Dixie Rd
 
1982
Hurontario St
 
1971
Lakeshore Rd E
 
1628
Other values (664)
67124 

Length

Max length26
Median length22
Mean length11.945035
Min length3

Characters and Unicode

Total characters932095
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)0.1%

Sample

1st rowAmbassador Dr
2nd rowAmbassador Dr
3rd rowAmbassador Dr
4th rowAmbassador Dr
5th rowAmbassador Dr

Common Values

ValueCountFrequency (%)
Dundas St E 3202
 
4.1%
Matheson Blvd E 2125
 
2.7%
Dixie Rd 1982
 
2.5%
Hurontario St 1971
 
2.5%
Lakeshore Rd E 1628
 
2.1%
Dundas St W 1586
 
2.0%
City Centre Dr 1528
 
2.0%
Britannia Rd E 1441
 
1.8%
Tomken Rd 1416
 
1.8%
Argentia Rd 1400
 
1.8%
Other values (659) 59753
76.6%

Length

2023-04-01T18:22:09.943536image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rd 28598
 
15.4%
dr 17906
 
9.7%
e 12045
 
6.5%
st 9954
 
5.4%
blvd 8011
 
4.3%
w 7247
 
3.9%
dundas 4805
 
2.6%
ave 3978
 
2.1%
matheson 2625
 
1.4%
pky 2575
 
1.4%
Other values (665) 87802
47.3%

Most occurring characters

ValueCountFrequency (%)
107515
 
11.5%
r 77031
 
8.3%
e 71980
 
7.7%
a 58785
 
6.3%
d 55948
 
6.0%
n 49725
 
5.3%
t 47986
 
5.1%
i 45031
 
4.8%
o 36410
 
3.9%
l 32503
 
3.5%
Other values (43) 349181
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 636923
68.3%
Uppercase Letter 187126
 
20.1%
Space Separator 107515
 
11.5%
Dash Punctuation 480
 
0.1%
Other Punctuation 51
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 77031
12.1%
e 71980
11.3%
a 58785
9.2%
d 55948
8.8%
n 49725
 
7.8%
t 47986
 
7.5%
i 45031
 
7.1%
o 36410
 
5.7%
l 32503
 
5.1%
s 27702
 
4.3%
Other values (15) 133822
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 31747
17.0%
D 29017
15.5%
S 18788
10.0%
E 16439
8.8%
B 14481
7.7%
C 13374
7.1%
W 11747
 
6.3%
M 9514
 
5.1%
A 9382
 
5.0%
T 6500
 
3.5%
Other values (14) 26137
14.0%
Other Punctuation
ValueCountFrequency (%)
' 45
88.2%
. 6
 
11.8%
Space Separator
ValueCountFrequency (%)
107515
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 824049
88.4%
Common 108046
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 77031
 
9.3%
e 71980
 
8.7%
a 58785
 
7.1%
d 55948
 
6.8%
n 49725
 
6.0%
t 47986
 
5.8%
i 45031
 
5.5%
o 36410
 
4.4%
l 32503
 
3.9%
R 31747
 
3.9%
Other values (39) 316903
38.5%
Common
ValueCountFrequency (%)
107515
99.5%
- 480
 
0.4%
' 45
 
< 0.1%
. 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 932095
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
107515
 
11.5%
r 77031
 
8.3%
e 71980
 
7.7%
a 58785
 
6.3%
d 55948
 
6.0%
n 49725
 
5.3%
t 47986
 
5.1%
i 45031
 
4.8%
o 36410
 
3.9%
l 32503
 
3.5%
Other values (43) 349181
37.5%

BldgNo
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
73798 
True
 
4234
ValueCountFrequency (%)
False 73798
94.6%
True 4234
 
5.4%
2023-04-01T18:22:10.411799image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

UnitNo
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
True
53665 
False
24367 
ValueCountFrequency (%)
True 53665
68.8%
False 24367
31.2%
2023-04-01T18:22:10.818322image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

PostalCode
Categorical

Distinct37
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
L4W
12410 
L5T
8326 
L5N
6083 
L4Z
4952 
L5L
4725 
Other values (32)
41536 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters234096
Distinct characters27
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowL5T
2nd rowL5T
3rd rowL5T
4th rowL5T
5th rowL5T

Common Values

ValueCountFrequency (%)
L4W 12410
15.9%
L5T 8326
 
10.7%
L5N 6083
 
7.8%
L4Z 4952
 
6.3%
L5L 4725
 
6.1%
L5B 4593
 
5.9%
L5S 4273
 
5.5%
L5M 3805
 
4.9%
L4T 3318
 
4.3%
L5A 3293
 
4.2%
Other values (27) 22254
28.5%

Length

2023-04-01T18:22:11.230301image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
l4w 12410
15.9%
l5t 8326
 
10.7%
l5n 6083
 
7.8%
l4z 4952
 
6.3%
l5l 4725
 
6.1%
l5b 4593
 
5.9%
l5s 4273
 
5.5%
l5m 3805
 
4.9%
l4t 3318
 
4.3%
l5a 3293
 
4.2%
Other values (26) 22254
28.5%

Most occurring characters

ValueCountFrequency (%)
L 82758
35.4%
5 49935
21.3%
4 28079
 
12.0%
W 13005
 
5.6%
T 11645
 
5.0%
N 6083
 
2.6%
Z 4952
 
2.1%
B 4595
 
2.0%
S 4273
 
1.8%
M 3806
 
1.6%
Other values (17) 24965
 
10.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 156054
66.7%
Decimal Number 78037
33.3%
Lowercase Letter 5
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 82758
53.0%
W 13005
 
8.3%
T 11645
 
7.5%
N 6083
 
3.9%
Z 4952
 
3.2%
B 4595
 
2.9%
S 4273
 
2.7%
M 3806
 
2.4%
A 3293
 
2.1%
V 3177
 
2.0%
Other values (11) 18467
 
11.8%
Decimal Number
ValueCountFrequency (%)
5 49935
64.0%
4 28079
36.0%
6 21
 
< 0.1%
8 1
 
< 0.1%
7 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
c 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 156059
66.7%
Common 78037
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 82758
53.0%
W 13005
 
8.3%
T 11645
 
7.5%
N 6083
 
3.9%
Z 4952
 
3.2%
B 4595
 
2.9%
S 4273
 
2.7%
M 3806
 
2.4%
A 3293
 
2.1%
V 3177
 
2.0%
Other values (12) 18472
 
11.8%
Common
ValueCountFrequency (%)
5 49935
64.0%
4 28079
36.0%
6 21
 
< 0.1%
8 1
 
< 0.1%
7 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 234096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 82758
35.4%
5 49935
21.3%
4 28079
 
12.0%
W 13005
 
5.6%
T 11645
 
5.0%
N 6083
 
2.6%
Z 4952
 
2.1%
B 4595
 
2.0%
S 4273
 
1.8%
M 3806
 
1.6%
Other values (17) 24965
 
10.7%

Location
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct56
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Northeast EA (West)
21104 
Western Business Park EA
5574 
Dixie EA
4786 
Gateway EA (East)
4760 
Meadowvale Business Park CC
4458 
Other values (51)
37350 

Length

Max length27
Median length23
Mean length16.691832
Min length7

Characters and Unicode

Total characters1302497
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGateway EA (East)
2nd rowGateway EA (East)
3rd rowGateway EA (East)
4th rowGateway EA (East)
5th rowGateway EA (East)

Common Values

ValueCountFrequency (%)
Northeast EA (West) 21104
27.0%
Western Business Park EA 5574
 
7.1%
Dixie EA 4786
 
6.1%
Gateway EA (East) 4760
 
6.1%
Meadowvale Business Park CC 4458
 
5.7%
DT Core 3192
 
4.1%
Airport CC 2231
 
2.9%
DT Cooksville 2065
 
2.6%
Northeast EA (East) 1926
 
2.5%
Mavis-Erindale EA 1822
 
2.3%
Other values (46) 26114
33.5%

Length

2023-04-01T18:22:11.714493image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ea 42359
19.4%
northeast 23030
 
10.5%
west 22942
 
10.5%
nhd 14276
 
6.5%
park 11037
 
5.1%
business 10032
 
4.6%
east 9149
 
4.2%
cc 7899
 
3.6%
gateway 6773
 
3.1%
dt 6142
 
2.8%
Other values (45) 64655
29.6%

Most occurring characters

ValueCountFrequency (%)
140262
 
10.8%
e 118825
 
9.1%
t 109250
 
8.4%
s 103821
 
8.0%
a 85081
 
6.5%
r 68155
 
5.2%
o 58559
 
4.5%
E 56161
 
4.3%
A 47775
 
3.7%
i 47770
 
3.7%
Other values (33) 466838
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 784844
60.3%
Uppercase Letter 312880
 
24.0%
Space Separator 140262
 
10.8%
Close Punctuation 30649
 
2.4%
Open Punctuation 30649
 
2.4%
Dash Punctuation 3213
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 118825
15.1%
t 109250
13.9%
s 103821
13.2%
a 85081
10.8%
r 68155
8.7%
o 58559
7.5%
i 47770
6.1%
l 32630
 
4.2%
n 29533
 
3.8%
h 27324
 
3.5%
Other values (11) 103896
13.2%
Uppercase Letter
ValueCountFrequency (%)
E 56161
17.9%
A 47775
15.3%
N 43931
14.0%
C 34625
11.1%
W 28516
9.1%
D 25204
8.1%
H 15708
 
5.0%
M 13549
 
4.3%
P 13477
 
4.3%
B 10032
 
3.2%
Other values (8) 23902
7.6%
Space Separator
ValueCountFrequency (%)
140262
100.0%
Close Punctuation
ValueCountFrequency (%)
) 30649
100.0%
Open Punctuation
ValueCountFrequency (%)
( 30649
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3213
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1097724
84.3%
Common 204773
 
15.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 118825
 
10.8%
t 109250
 
10.0%
s 103821
 
9.5%
a 85081
 
7.8%
r 68155
 
6.2%
o 58559
 
5.3%
E 56161
 
5.1%
A 47775
 
4.4%
i 47770
 
4.4%
N 43931
 
4.0%
Other values (29) 358396
32.6%
Common
ValueCountFrequency (%)
140262
68.5%
) 30649
 
15.0%
( 30649
 
15.0%
- 3213
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1302497
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
140262
 
10.8%
e 118825
 
9.1%
t 109250
 
8.4%
s 103821
 
8.0%
a 85081
 
6.5%
r 68155
 
5.2%
o 58559
 
4.5%
E 56161
 
4.3%
A 47775
 
3.7%
i 47770
 
3.7%
Other values (33) 466838
35.8%

Ward
Real number (ℝ)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.3913395
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:12.158611image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median5
Q37
95-th percentile11
Maximum11
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.4758594
Coefficient of variation (CV)0.459229
Kurtosis0.01057504
Mean5.3913395
Median Absolute Deviation (MAD)1
Skewness0.34308626
Sum420697
Variance6.12988
MonotonicityNot monotonic
2023-04-01T18:22:12.532387image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
5 33956
43.5%
1 6772
 
8.7%
8 6086
 
7.8%
7 5561
 
7.1%
3 5005
 
6.4%
9 4687
 
6.0%
11 4300
 
5.5%
4 4163
 
5.3%
6 3584
 
4.6%
2 3163
 
4.1%
ValueCountFrequency (%)
1 6772
 
8.7%
2 3163
 
4.1%
3 5005
 
6.4%
4 4163
 
5.3%
5 33956
43.5%
6 3584
 
4.6%
7 5561
 
7.1%
8 6086
 
7.8%
9 4687
 
6.0%
10 755
 
1.0%
ValueCountFrequency (%)
11 4300
 
5.5%
10 755
 
1.0%
9 4687
 
6.0%
8 6086
 
7.8%
7 5561
 
7.1%
6 3584
 
4.6%
5 33956
43.5%
4 4163
 
5.3%
3 5005
 
6.4%
2 3163
 
4.1%

NAICSCode
Real number (ℝ)

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.937603
Minimum11
Maximum91
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:12.957747image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile31
Q141
median52
Q362
95-th percentile81
Maximum91
Range80
Interquartile range (IQR)21

Descriptive statistics

Standard deviation15.992614
Coefficient of variation (CV)0.3021031
Kurtosis-0.68005714
Mean52.937603
Median Absolute Deviation (MAD)10
Skewness0.30420754
Sum4130827
Variance255.7637
MonotonicityNot monotonic
2023-04-01T18:22:13.416794image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
81 9052
11.6%
44 9014
11.6%
41 8749
11.2%
54 7102
9.1%
62 6459
 
8.3%
72 6148
 
7.9%
33 5710
 
7.3%
61 3050
 
3.9%
52 2995
 
3.8%
48 2889
 
3.7%
Other values (14) 16864
21.6%
ValueCountFrequency (%)
11 6
 
< 0.1%
21 15
 
< 0.1%
22 63
 
0.1%
23 2783
 
3.6%
31 1144
 
1.5%
32 2828
 
3.6%
33 5710
7.3%
41 8749
11.2%
44 9014
11.6%
45 2057
 
2.6%
ValueCountFrequency (%)
91 460
 
0.6%
81 9052
11.6%
72 6148
7.9%
71 1039
 
1.3%
62 6459
8.3%
61 3050
 
3.9%
56 2607
 
3.3%
55 504
 
0.6%
54 7102
9.1%
53 1838
 
2.4%

NAICSCat
Categorical

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Retail Trade
11071 
Manufacturing
9682 
Other Services
9053 
Wholesale Trade
8749 
Professional, Scientific and Technical Services
7102 
Other values (14)
32375 

Length

Max length69
Median length35
Mean length23.769364
Min length9

Characters and Unicode

Total characters1854771
Distinct characters37
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWholesale Trade
2nd rowManufacturing
3rd rowManufacturing
4th rowManufacturing
5th rowWholesale Trade

Common Values

ValueCountFrequency (%)
Retail Trade 11071
14.2%
Manufacturing 9682
12.4%
Other Services 9053
11.6%
Wholesale Trade 8749
11.2%
Professional, Scientific and Technical Services 7102
9.1%
Health Care and Social Assistance 6459
8.3%
Accommodation and Food Services 6148
7.9%
Transportation and Warehousing 3789
 
4.9%
Educational Services 3050
 
3.9%
Finance and Insurance 2995
 
3.8%
Other values (9) 9934
12.7%

Length

2023-04-01T18:22:14.286912image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 37544
16.2%
services 27960
 
12.1%
trade 19820
 
8.6%
retail 11071
 
4.8%
manufacturing 9682
 
4.2%
other 9053
 
3.9%
wholesale 8749
 
3.8%
professional 7102
 
3.1%
scientific 7102
 
3.1%
technical 7102
 
3.1%
Other values (35) 85934
37.2%

Most occurring characters

ValueCountFrequency (%)
e 194173
 
10.5%
a 191887
 
10.3%
153087
 
8.3%
n 145859
 
7.9%
i 140943
 
7.6%
r 108943
 
5.9%
c 104586
 
5.6%
t 102451
 
5.5%
s 96869
 
5.2%
o 89096
 
4.8%
Other values (27) 526877
28.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1497865
80.8%
Uppercase Letter 193071
 
10.4%
Space Separator 153087
 
8.3%
Other Punctuation 10748
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 194173
13.0%
a 191887
12.8%
n 145859
9.7%
i 140943
9.4%
r 108943
7.3%
c 104586
7.0%
t 102451
6.8%
s 96869
 
6.5%
o 89096
 
5.9%
d 79025
 
5.3%
Other values (10) 244033
16.3%
Uppercase Letter
ValueCountFrequency (%)
S 44128
22.9%
T 30711
15.9%
R 18391
9.5%
A 16713
 
8.7%
W 15145
 
7.8%
M 12793
 
6.6%
C 10366
 
5.4%
F 9143
 
4.7%
O 9053
 
4.7%
P 7583
 
3.9%
Other values (5) 19045
9.9%
Space Separator
ValueCountFrequency (%)
153087
100.0%
Other Punctuation
ValueCountFrequency (%)
, 10748
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1690936
91.2%
Common 163835
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 194173
11.5%
a 191887
11.3%
n 145859
 
8.6%
i 140943
 
8.3%
r 108943
 
6.4%
c 104586
 
6.2%
t 102451
 
6.1%
s 96869
 
5.7%
o 89096
 
5.3%
d 79025
 
4.7%
Other values (25) 437104
25.8%
Common
ValueCountFrequency (%)
153087
93.4%
, 10748
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1854771
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 194173
 
10.5%
a 191887
 
10.3%
153087
 
8.3%
n 145859
 
7.9%
i 140943
 
7.6%
r 108943
 
5.9%
c 104586
 
5.6%
t 102451
 
5.5%
s 96869
 
5.2%
o 89096
 
4.8%
Other values (27) 526877
28.4%

NAICSDescr
Categorical

Distinct1039
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Limited-service eating places
 
3647
General Automotive Repair
 
1992
Full-service restaurants
 
1777
Offices of Dentists
 
1603
Offices of Physicians
 
1504
Other values (1034)
67509 

Length

Max length175
Median length80
Mean length35.436385
Min length6

Characters and Unicode

Total characters2765172
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)0.2%

Sample

1st rowAmusement and Sporting Goods Wholesaler-Distributors
2nd rowSupport Activities for Printing
3rd rowSupport Activities for Printing
4th rowOther Printing
5th rowIndustrial Machinery, Equipment and Supplies Wholesaler-Distributors

Common Values

ValueCountFrequency (%)
Limited-service eating places 3647
 
4.7%
General Automotive Repair 1992
 
2.6%
Full-service restaurants 1777
 
2.3%
Offices of Dentists 1603
 
2.1%
Offices of Physicians 1504
 
1.9%
Offices of Lawyers 1376
 
1.8%
Beauty Salons 1302
 
1.7%
Other Freight Transportation Arrangement 1255
 
1.6%
Elementary and Secondary Schools 1240
 
1.6%
Religious Organizations 1098
 
1.4%
Other values (1029) 61238
78.5%

Length

2023-04-01T18:22:14.855514image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 33347
 
10.0%
other 18681
 
5.6%
stores 9245
 
2.8%
offices 8694
 
2.6%
of 8405
 
2.5%
services 8315
 
2.5%
all 8273
 
2.5%
wholesaler-distributors 7178
 
2.1%
manufacturing 6730
 
2.0%
supplies 4486
 
1.3%
Other values (1054) 221747
66.2%

Most occurring characters

ValueCountFrequency (%)
e 278627
 
10.1%
258164
 
9.3%
i 198022
 
7.2%
r 189307
 
6.8%
n 183101
 
6.6%
t 181749
 
6.6%
a 181007
 
6.5%
s 160174
 
5.8%
o 139412
 
5.0%
l 115516
 
4.2%
Other values (51) 880093
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2193494
79.3%
Uppercase Letter 276079
 
10.0%
Space Separator 258605
 
9.4%
Dash Punctuation 17709
 
0.6%
Other Punctuation 11390
 
0.4%
Open Punctuation 4149
 
0.2%
Close Punctuation 3340
 
0.1%
Control 406
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 278627
12.7%
i 198022
9.0%
r 189307
8.6%
n 183101
 
8.3%
t 181749
 
8.3%
a 181007
 
8.3%
s 160174
 
7.3%
o 139412
 
6.4%
l 115516
 
5.3%
c 105666
 
4.8%
Other values (16) 460913
21.0%
Uppercase Letter
ValueCountFrequency (%)
S 38648
14.0%
O 30856
11.2%
A 24817
 
9.0%
C 24436
 
8.9%
M 21775
 
7.9%
P 18986
 
6.9%
D 14648
 
5.3%
W 12588
 
4.6%
E 11736
 
4.3%
F 11266
 
4.1%
Other values (15) 66323
24.0%
Other Punctuation
ValueCountFrequency (%)
, 9665
84.9%
' 803
 
7.1%
& 488
 
4.3%
. 434
 
3.8%
Space Separator
ValueCountFrequency (%)
258164
99.8%
  441
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
- 17709
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4149
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3340
100.0%
Control
ValueCountFrequency (%)
406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2469573
89.3%
Common 295599
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 278627
 
11.3%
i 198022
 
8.0%
r 189307
 
7.7%
n 183101
 
7.4%
t 181749
 
7.4%
a 181007
 
7.3%
s 160174
 
6.5%
o 139412
 
5.6%
l 115516
 
4.7%
c 105666
 
4.3%
Other values (41) 736992
29.8%
Common
ValueCountFrequency (%)
258164
87.3%
- 17709
 
6.0%
, 9665
 
3.3%
( 4149
 
1.4%
) 3340
 
1.1%
' 803
 
0.3%
& 488
 
0.2%
  441
 
0.1%
. 434
 
0.1%
406
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2764731
> 99.9%
None 441
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 278627
 
10.1%
258164
 
9.3%
i 198022
 
7.2%
r 189307
 
6.8%
n 183101
 
6.6%
t 181749
 
6.6%
a 181007
 
6.5%
s 160174
 
5.8%
o 139412
 
5.0%
l 115516
 
4.2%
Other values (50) 879652
31.8%
None
ValueCountFrequency (%)
  441
100.0%

Phone
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
True
77399 
False
 
633
ValueCountFrequency (%)
True 77399
99.2%
False 633
 
0.8%
2023-04-01T18:22:15.422757image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Fax
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
True
50803 
False
27229 
ValueCountFrequency (%)
True 50803
65.1%
False 27229
34.9%
2023-04-01T18:22:15.752418image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

TollFree
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
66596 
True
11436 
ValueCountFrequency (%)
False 66596
85.3%
True 11436
 
14.7%
2023-04-01T18:22:15.969180image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

EMail
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
True
47406 
False
30626 
ValueCountFrequency (%)
True 47406
60.8%
False 30626
39.2%
2023-04-01T18:22:16.189044image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

WebAddress
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
True
56765 
False
21267 
ValueCountFrequency (%)
True 56765
72.7%
False 21267
 
27.3%
2023-04-01T18:22:16.408752image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

EmplRange
Real number (ℝ)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1437743
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:16.590339image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile5
Maximum9
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4384102
Coefficient of variation (CV)0.67097088
Kurtosis1.3215574
Mean2.1437743
Median Absolute Deviation (MAD)1
Skewness1.3064146
Sum167283
Variance2.0690238
MonotonicityNot monotonic
2023-04-01T18:22:16.801311image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1 37312
47.8%
2 16050
20.6%
3 10510
 
13.5%
4 8120
 
10.4%
5 3313
 
4.2%
6 2149
 
2.8%
7 318
 
0.4%
8 164
 
0.2%
9 96
 
0.1%
ValueCountFrequency (%)
1 37312
47.8%
2 16050
20.6%
3 10510
 
13.5%
4 8120
 
10.4%
5 3313
 
4.2%
6 2149
 
2.8%
7 318
 
0.4%
8 164
 
0.2%
9 96
 
0.1%
ValueCountFrequency (%)
9 96
 
0.1%
8 164
 
0.2%
7 318
 
0.4%
6 2149
 
2.8%
5 3313
 
4.2%
4 8120
 
10.4%
3 10510
 
13.5%
2 16050
20.6%
1 37312
47.8%

CENT_X
Real number (ℝ)

Distinct9085
Distinct (%)11.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean608667.3
Minimum596627.93
Maximum617060.11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:17.077688image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum596627.93
5-th percentile601477.61
Q1606588.08
median609003.29
Q3611252.11
95-th percentile614719.98
Maximum617060.11
Range20432.171
Interquartile range (IQR)4664.0241

Descriptive statistics

Standard deviation3790.1103
Coefficient of variation (CV)0.0062268999
Kurtosis0.047766822
Mean608667.3
Median Absolute Deviation (MAD)2330.119
Skewness-0.44615725
Sum4.7495526 × 1010
Variance14364936
MonotonicityNot monotonic
2023-04-01T18:22:17.367227image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
609549.2555 1701
 
2.2%
609556.5032 709
 
0.9%
612552.1674 532
 
0.7%
604009.418 436
 
0.6%
609657.7584 354
 
0.5%
615480.8966 322
 
0.4%
611454.4028 277
 
0.4%
611830.703 223
 
0.3%
608539.0792 209
 
0.3%
612581.1624 209
 
0.3%
Other values (9075) 73060
93.6%
ValueCountFrequency (%)
596627.9342 4
 
< 0.1%
596636.3174 1
 
< 0.1%
596752.9696 4
 
< 0.1%
596761.7476 1
 
< 0.1%
597263.154 1
 
< 0.1%
597309.0542 6
 
< 0.1%
597312.632 3
 
< 0.1%
597730.9671 23
 
< 0.1%
597763.1149 2
 
< 0.1%
597772.3526 111
0.1%
ValueCountFrequency (%)
617060.1055 1
 
< 0.1%
616985.0552 16
< 0.1%
616918.4738 1
 
< 0.1%
616917.8604 4
 
< 0.1%
616879.86 3
 
< 0.1%
616839.6893 1
 
< 0.1%
616837.5953 1
 
< 0.1%
616836.9092 5
 
< 0.1%
616794.193 4
 
< 0.1%
616769.3441 1
 
< 0.1%

CENT_Y
Real number (ℝ)

Distinct12366
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4829527.1
Minimum4815546.6
Maximum4843107.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-04-01T18:22:17.664137image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum4815546.6
5-th percentile4819128.2
Q14825873.6
median4829298.9
Q34833808.1
95-th percentile4839298.2
Maximum4843107.8
Range27561.199
Interquartile range (IQR)7934.5206

Descriptive statistics

Standard deviation5795.145
Coefficient of variation (CV)0.0011999405
Kurtosis-0.63809248
Mean4829527.1
Median Absolute Deviation (MAD)3954.81
Skewness-0.052466864
Sum3.7685766 × 1011
Variance33583706
MonotonicityNot monotonic
2023-04-01T18:22:17.952152image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4819128.214 1701
 
2.2%
4837278.362 532
 
0.7%
4827620.949 520
 
0.7%
4823628.592 323
 
0.4%
4825810.215 277
 
0.4%
4841687.188 244
 
0.3%
4827728.859 231
 
0.3%
4827535.97 201
 
0.3%
4827620.949 189
 
0.2%
4831996.045 179
 
0.2%
Other values (12356) 73635
94.4%
ValueCountFrequency (%)
4815546.641 3
< 0.1%
4815549.405 1
 
< 0.1%
4815601.213 1
 
< 0.1%
4815609.051 3
< 0.1%
4815609.051 1
 
< 0.1%
4816100.511 1
 
< 0.1%
4816109.607 4
< 0.1%
4816303.869 1
 
< 0.1%
4816333.508 4
< 0.1%
4816361.694 1
 
< 0.1%
ValueCountFrequency (%)
4843107.84 26
< 0.1%
4843107.84 10
 
< 0.1%
4843106.933 3
 
< 0.1%
4843045.912 1
 
< 0.1%
4843040.829 3
 
< 0.1%
4843040.829 1
 
< 0.1%
4842998.68 3
 
< 0.1%
4842998.68 1
 
< 0.1%
4842995.781 2
 
< 0.1%
4842855.077 1
 
< 0.1%

Year
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
2019
16518 
2018
16350 
2017
15737 
2021
14825 
2016
14602 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters312128
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2016
3rd row2016
4th row2016
5th row2016

Common Values

ValueCountFrequency (%)
2019 16518
21.2%
2018 16350
21.0%
2017 15737
20.2%
2021 14825
19.0%
2016 14602
18.7%

Length

2023-04-01T18:22:18.231817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-04-01T18:22:18.507812image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
2019 16518
21.2%
2018 16350
21.0%
2017 15737
20.2%
2021 14825
19.0%
2016 14602
18.7%

Most occurring characters

ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 312128
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Common 312128
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 312128
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Age
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
1
21240 
2
18801 
3
15727 
4
12761 
5
9503 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters78032
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

Length

2023-04-01T18:22:18.748281image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-04-01T18:22:19.033251image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

Most occurring characters

ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 78032
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

Most occurring scripts

ValueCountFrequency (%)
Common 78032
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 78032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 21240
27.2%
2 18801
24.1%
3 15727
20.2%
4 12761
16.4%
5 9503
12.2%

isnew
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
71148 
True
 
6884
ValueCountFrequency (%)
False 71148
91.2%
True 6884
 
8.8%
2023-04-01T18:22:19.328570image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Closed
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
71617 
True
 
6415
ValueCountFrequency (%)
False 71617
91.8%
True 6415
 
8.2%
2023-04-01T18:22:19.547415image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Interactions

2023-04-01T18:21:59.900398image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:27.219881image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:30.253121image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:32.974365image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:35.621433image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:39.621807image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:43.412418image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:46.137369image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:48.987125image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:51.866326image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:55.344315image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:00.162128image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:27.492062image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:30.510482image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:33.238270image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:36.038679image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:40.019423image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:43.692394image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:46.395339image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:49.260357image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:52.142572image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:55.761407image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:00.421064image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:27.744434image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:30.743967image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:33.475044image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:36.361266image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:40.325697image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:43.933951image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:46.826151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:49.520452image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:52.398951image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:56.152202image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:00.645464image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:28.008538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:30.968895image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:33.695592image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:36.674381image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:40.681404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:44.163764image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:47.047544image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:49.770824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:52.631956image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:56.527964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:00.887917image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:28.273588image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:31.246100image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:33.933031image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:37.012195image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:40.993699image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:44.432760image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:47.292233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:50.034622image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:52.929867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:56.939626image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:01.131088image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:28.507609image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:31.475721image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:34.178535image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:37.396852image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:41.362421image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:44.671262image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:47.517407image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:50.282913image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:53.232529image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:57.320394image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:01.384075image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:28.902500image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:31.705450image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:34.399686image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:37.757770image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:41.738449image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:44.886275image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:47.750816image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:50.527368image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:53.531527image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:57.704810image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:01.643252image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:29.172544image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:31.932236image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:34.608854image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:38.060392image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:42.113666image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:45.110749image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:47.967479image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:50.780496image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:53.824053image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:58.099200image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:01.904798image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:29.445142image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:32.208323image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:34.859801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:38.485168image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:42.527087image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:45.369003image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:48.232965image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:51.056883image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:54.219403image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:58.504972image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:02.168856image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:29.713742image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:32.471377image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:35.112013image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:38.821334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:42.890977image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:45.624506image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:48.483670image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:51.334016image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:54.633961image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:58.931370image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:22:02.448189image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:29.980118image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:32.724242image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:35.382789image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:39.230479image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:43.159651image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:45.880369image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:48.730663image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:51.604151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:54.963377image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-04-01T18:21:59.643667image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-04-01T18:22:19.759837image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-04-01T18:22:20.184716image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-04-01T18:22:20.514153image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-04-01T18:22:20.828964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-04-01T18:22:21.160615image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-04-01T18:22:21.509404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-04-01T18:22:03.046003image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-04-01T18:22:04.191465image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

RecordIDXYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSCatNAICSDescrPhoneFaxTollFreeEMailWebAddressEmplRangeCENT_XCENT_YYearAgeisnewClosed
01-79.68982943.64418111055Golf Trends Inc.300 Ambassador Dr300Ambassador DrNoNoL5TGateway EA (East)541Wholesale TradeAmusement and Sporting Goods Wholesaler-DistributorsYesYesYesYesYes3605668.25384.833187e+0620161NoNo
12-79.68941943.64498821057Apex Graphics Inc.320 Ambassador Dr320Ambassador DrNoNoL5TGateway EA (East)532ManufacturingSupport Activities for PrintingYesYesNoYesYes4605699.93704.833277e+0620161NoNo
23-79.68941943.64498831058Sands, John & Associates Limited320 Ambassador Dr320Ambassador DrNoNoL5TGateway EA (East)532ManufacturingSupport Activities for PrintingYesYesNoNoNo5605699.93704.833277e+0620161NoNo
34-79.68941943.64498841060Printmedia-Tackaberry Times320 Ambassador Dr320Ambassador DrNoNoL5TGateway EA (East)532ManufacturingOther PrintingYesYesNoYesYes1605699.93704.833277e+0620161NoNo
45-79.69066443.64549351061S W R Industries Ltd.321 Ambassador Dr321Ambassador DrNoNoL5TGateway EA (East)541Wholesale TradeIndustrial Machinery, Equipment and Supplies Wholesaler-DistributorsYesYesNoYesYes2605598.64424.833332e+0620161NoNo
56-79.69027743.64637261063Crossdock Freight Solutions361 Ambassador Dr361Ambassador DrNoNoL5TGateway EA (East)548Transportation and WarehousingOther Freight Transportation ArrangementYesYesNoYesYes4605628.28384.833430e+0620161NoNo
67-79.68987743.64691471065Green Belting Industries Ltd.381 Ambassador Dr381Ambassador DrNoNoL5TGateway EA (East)532ManufacturingPaint and Coating ManufacturingYesYesYesYesYes5605659.56464.833490e+0620161NoNo
78-79.63427943.64040481073Dafco Filtration Group Corporation5390 Ambler Dr5390Ambler DrNoYesL4WNortheast EA (West)533ManufacturingIndustrial and Commercial Fan and Blower and Air Purification Equipment ManufacturingYesYesNoYesYes5610155.41824.832840e+0620161NoNo
89-79.63284443.64133791074Ace Trans Inc.5391 Ambler Dr5391Ambler DrNoYesL4WNortheast EA (West)549Transportation and WarehousingGeneral Warehousing and StorageYesYesNoYesYes1610269.46404.832945e+0620161NoNo
910-79.63781543.642638101077Petro Maxx5510 Ambler Dr5510Ambler DrNoYesL4WNortheast EA (West)554Professional, Scientific and Technical ServicesOther Specialized Design ServicesYesNoNoYesYes4609866.14524.833083e+0620161NoNo
RecordIDXYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSCatNAICSDescrPhoneFaxTollFreeEMailWebAddressEmplRangeCENT_XCENT_YYearAgeisnewClosed
7802278023-79.65277443.7094661481657550Advance Car & Truck Rental2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (West)553Real Estate and Rental and LeasingPassenger Car RentalYesYesYesYesYes1608544.36644.840490e+0620215NoNo
7802378024-79.65277443.7094661481757551Video Palace2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (West)553Real Estate and Rental and LeasingAll Other Consumer Goods RentalYesNoNoNoNo1608544.36644.840490e+0620213NoNo
7802478025-79.65277443.7094661481857552Secure Life Insurance Agency Inc.2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (West)552Finance and InsuranceDirect Group Life, Health and Medical Insurance CarriersYesYesYesNoYes1608544.36644.840490e+0620215NoNo
7802578026-79.65277443.7094661481957555Skillman Flooring2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (West)544Retail TradeFloor Covering StoresYesYesNoYesYes1608544.36644.840490e+0620215NoNo
7802678027-79.65277443.7094661482057557Verma Vastar Manufacturing Inc.2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (West)531ManufacturingCut and Sew Clothing ContractingYesNoNoNoNo1608544.36644.840490e+0620214NoNo
7802778028-79.6527740.0000001482160142JobsForU2960 Drew Rd2960Drew RdNoYesL4TNortheast EA (East)556Administrative and Support, Waste Management and Remediation ServicesEmployment Placement Agencies and Executive Search ServicesYesNoNoYesYes3608544.36644.840490e+0620211YesNo
7802878029-79.6527740.0000001482260159Elite Source Solutions2980 Drew Rd2980Drew RdNoYesL4TNortheast EA (East)556Administrative and Support, Waste Management and Remediation ServicesEmployment Placement Agencies and Executive Search ServicesYesNoNoNoNo1608544.36644.840490e+0620211YesNo
7802978030-79.6527740.0000001482360160Indian Sweet Master2980 Drew Rd2980Drew RdNoYesL4TNortheast EA (East)572Accommodation and Food ServicesFull-service restaurantsYesNoNoNoNo1608544.36644.840490e+0620211YesNo
7803078031-79.6527740.0000001482460161Mississauga Flooring & Supplies Inc.2980 Drew Rd2980Drew RdNoYesL4TNortheast EA (East)541Wholesale TradeFloor Covering Wholesaler-DistributorsYesNoNoNoNo1608544.36644.840490e+0620211YesNo
7803178032-79.6527740.0000001482560162Punjabi Textile Ltd.2980 Drew Rd2980Drew RdNoYesL4TNortheast EA (East)541Wholesale TradeClothing and Clothing Accessories Wholesaler-DistributorsYesNoNoNoNo1608544.36644.840490e+0620211YesNo